Reducing Latency, Power, and Gate Count with the Tensilica Floating-Point FMA
نویسنده
چکیده
Today’s digital signal processing applications such as radar, echo cancellation, and image processing are demanding more dynamic range and computation accuracy. Floating-point arithmetic units offer better precision, higher dynamic range, and shorter development cycles when compared to fixed-point arithmetic units. Minimizing the design’s time to market is more important than ever. Algorithm developers use MATLAB to develop and test their ideas, which are mostly floating-point arithmetic based. However, digital signal processor (DSP) programmers port the algorithms into fixed-precision arithmetic units since floating-point arithmetic units are considerably larger, slower, and more power-hungry than the fixed-point arithmetic units. This is not a trivial effort as the programmers must verify the results, including the error rate (accuracy) of fixed-point and floating-point algorithms. Furthermore, usually fixed-point software codes require more cycles than floating-point versions on many algorithms. For example, using the Cadence® Tensilica BBE32EP DSP, a 4x4 matrix Cholesky decomposition in fixed-point takes 18 cycles, while in floating-point it takes 15 cycles. As this example illustrates, it makes sense to keep using floating-point computation units when greater dynamic range and accuracy are required for an application. To overcome some of the drawbacks of floating-point arithmetic units, Cadence has developed an innovative patent-granted design.
منابع مشابه
Design of Gate-Driven Quasi Floating Bulk OTA-Based Gm–C Filter for PLL Applications
The advancement in the integrated circuit design has developed the demand for low voltage portable analog devices in the market. This demand has increased the requirement of the low-power RF transceiver. A low-power phase lock loop (PLL) is always desirable to fulfill the need for a low power RF transceiver. This paper deals with the designing of the low power transconductance- capacitance (Gm-...
متن کاملMixed-precision Fused Multiply and Add
The standard floating-point fused multiply and add (FMA) computes R=AB+C with a single rounding. This article investigates a variant of this operator where the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). With minor modifications, this operator is...
متن کاملAnalyzing Two-Term Dot Product of Multiplier Using Floating Point and Booth Multiplier
The Floating Point in two-term Dot-Product of multiplier referred as discrete design. Floating Point is a wide variety for increasing accuracy, high speed, high performance and reducing delay, area and power consumption. This application of floating point is used for algorithms of Digital Signal Processing and Graphics. Many floating point application is to reduce area, from the survey the fuse...
متن کاملError bounds on complex floating-point multiplication with an FMA
The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...
متن کاملA Survey on Floating Point Adders
Addition is the most complex operation in a floating-point unit and can cause major delay while requiring a significant area. Over the years, the VLSI community has developed many floating-point adder algorithms aimed primarily at reducing the overall latency. An efficient design of the floating-point adder offers major area and performance improvements for FPGAs. This paper studies the impleme...
متن کامل